The Hyperdimensional Transform: a Holographic Representation of Functions

The hyperdimensional transform allows for converting functions into high-dimensional, holographic vectors, as known in the field of hyperdimensional computing or vector symbolic architectures.

This notebook supports our paper "The Hyperdimensional Transform: a Holgraphic Representation of Functions". It translates the theoretical concepts into simple, interpretable code. In a few lines, we show how one can implement the transform and reproduce the results in the paper. For example, the tutorial demonstrates how the transform allows one for easily solving a differential equation.

Note that this notebook is only supporting material and is not written a stand-alone tutorial. For theory, details, explanation and more examples, we refer the reader to our paper.

First, we include some supporting code. Then we define the domain: the interval $[0,1]$, represented by $m$ equidistant points

In [1]:
using LinearAlgebra, Plots, StatsBase

# supporting code
function ϕ(x, D, l; real=true)
    ϕx = zeros(m, D)
    for i in 1:D
        signs = rand((-1,1), ceil(Int, 1/l)+1)
        shift = l * rand()
        ϕx[:,i] .= @. (1-cos(2π/l*(x+shift))) * signs[(ceil(Int, (x+shift)/l))]
    end
    return real ? ϕx : sign.(ϕx)
end

# compute the normalization coefficients at the discrete points via
# iteratively solving the Hammerstein equation.
function find_normalization(K; n_iter=10)
    n = sqrt.(mean(K, dims=1))[:]
    onetilde = mean(K ./ n, dims=1)[:] ./ n
    p = plot(onetilde, label="iteration 0", legend=:bottomright)
    for i in 1:n_iter
        @. n = n * sqrt(onetilde)
        onetilde .= mean(K ./ n, dims=1)[:] ./ n
        plot!(onetilde, label="iteration $i")
    end
    return p, n
end

function δ(x, D, L; real)
    ϕx = ϕ(x, D, l; real)
    p,n = find_normalization(ϕx * ϕx' / D, n_iter=10)
    return ϕx ./ n
end

function deriv_encoding(X; h=1)
    res = zero(X)
    for i in 1:size(X,1)
        l = max(i-h, 1)
        u = min(i+h, size(X,1))
        res[i,:] = (X[u,:] - X[l,:]) / (u - l) * m
    end
    return res
end

function plot_encoding(ϕₓ)
    p = plot(layout=(2,1), size=(600,600))
    plot!(p[1], x, ϕₓ[:,1:4], label=["ϕ₁(x)" "ϕ₂(x)" "ϕ₃(x)" "..."], xlabel="x", ylabel="random function value")
    plot!(p[2], x, ϕₓ*ϕₓ[1,:] / D, label="ϕₓ*ϕ₀ / D", xlabel="x", ylabel="similarity to neighbors")
    plot!(p[2], x, ϕₓ*ϕₓ[125,:] / D, label="ϕₓ*ϕ₀₋₂₅ / D")
    plot!(p[2], x, ϕₓ*ϕₓ[250,:] / D, label="ϕₓ*ϕ₀₋₅ / D")
    plot!(p[2], x, ϕₓ*ϕₓ[375,:] / D, label="ϕₓ*ϕ₀₋₇₅ / D")
    plot!(p[2], x, ϕₓ*ϕₓ[500,:] / D, label="ϕₓ*ϕ₁ / D")
end

function plot_normalized_encoding(δₓ)
    p = plot(layout=(2,1), size=(600,600))
    plot!(p[1], x, δₓ[:,1:4], label=["δ₁(x)" "δ₂(x)" "δ₃(x)" "..."], xlabel="x", ylabel="random function value")
    plot!(p[2], x, δₓ*δₓ[1,:] / D, label="δₓ*δ₀ / D", xlabel="x", ylabel="similarity to neighbors")
    plot!(p[2], x, δₓ*δₓ[125,:] / D, label="δₓ*δ₀₋₂₅ / D")
    plot!(p[2], x, δₓ*δₓ[250,:] / D, label="δₓ*δ₀₋₅ / D")
    plot!(p[2], x, δₓ*δₓ[375,:] / D, label="δₓ*δ₀₋₇₅ / D")
    plot!(p[2], x, δₓ*δₓ[500,:] / D, label="δₓ*δ₁ / D")
end

m = 500
x = range(0, 1, length=m);

Create an encoding

We create an encoding of the domain x: each point is mapped to a D-dimensoinal vector via a vector-valued function $\phi$. Each component can be seen as a random function switching randomly between 1 and -1. As a result, if points $x$ and $x'$ are close, then $\phi(x)$ and $\phi(x')$ are similar. If they are further away, then the random switching will lead to dissimilar representations. The rate at which this similarity drops is connected to the average frequency at wich the function switches, determined by lengthscale $l$. As with kernel methods, the lengthscale $l$ can be seen as a kind of bandwidth parameter.

First panel: we plot a few components of the encoding $\phi$ for each point $x$ in our interval.

Second panel: we plot the similarity between the components of $\phi(x)$ and $\phi(x')$ for a few fixed values of $x'$. The similarity is measured via the dot product, rescaled by the dimensionality $D$, i.e., the number of components.

In [2]:
D, l =  20_000, 0.2;
ϕₓ = ϕ(x, D, l, real=false);
plot_encoding(ϕₓ)
Out[2]:

If we select a shorter length scale $l=0.1$, the components $\phi$ switch more frequently and the similarity drops faster. When further away than the length scale, the vectors are maximally dissimilar (similarity of random vectors).

In [3]:
D, l =  20_000, 0.1;
ϕₓ = ϕ(x, D, l, real=false);
plot_encoding(ϕₓ)
Out[3]:

Now follows a different way of encoding that was not presented in the paper; that has a similar motivation, but has new interesting properties.

Each component is now a real-valued function that is a concatenation of pieces of cosine functions that during one period are on the postive or on the negative side, randomly. Additionally, each component also has a random phase shift to induce translation invariance. One period is exactly the length scale. This encoding is more smooth, and even differentiable. As we will see, taking the sign of this encoding leads to the very same encoding as the bipolar one.

In [4]:
D, l =  20_000, 0.1;
ϕₓ = ϕ(x, D, l, real=true);
plot_encoding(ϕₓ)
Out[4]:

Comparing to the bipolar encoding, the similarity to the closer neighbors is a bit higher, and drops faster for further-away points.

If we take the sign of this encoding, we arrive exactly at the bipolar encoding from earlier:

In [5]:
ϕₓ = sign.(ϕₓ);
plot_encoding(ϕₓ)
Out[5]:

Create a normalized encoding

In order to have we well-defined transformation, we need some kind of normalization. The integral transform will integrate on the domain [0,1]. However, as we see in the right hand-side panels above, integration would put less weight at the boundary vectors, compared to the ones in the middle, because the boundary vectors have less similar neighbors.

Therefore, we introduced the normalized version of the encoding, denoted by $\delta$. It may be seen as an approximation of a Dirac delta distribution with finite length scale $l$, represented via hyperdimensional vectors. The total mass under the curves in the right hand-side panel should be 1. The smaller the length scale $l$, the closer the Dirac distribution is approximated.

This normalized encoding $\delta$ is obtained by dividing the encoding $\phi$ by a function $n$ that can be found by solving a Hammerstein equation. One can observe that the amplitude of $\phi$ at the boundaries increases to compensate for the lower weight.

In [6]:
D, l =  20_000, 0.1;
δₓ = δ(x, D, l, real=false);
plot_normalized_encoding(δₓ)
Out[6]:

The transform

With the choice of our domain, the transform is now an integral $\int_0^1 f(x) \delta(x) \text{d}x$. We approximate this via our domain represented as $m$ points. The transform is then simply a matrix multiplication of the vector holding the function evaluations in all points $x$, and the matrix holding the D-dimensional representations of each x; dividend by the number of points $m$.

We plot the $D$ components of $F$, these seem indeed random, holographic.

In [7]:
# the function
f(x) = x * sin(20x)

# the encoding
D, l =  20_000, 0.1;
δₓ = δ(x, D, l, real=true);

# the transform
F = δₓ' * f.(x)  ./ m;

scatter(F, alpha=0.3)
Out[7]:

Function evaluation of the inverse stransform, is a product with the delta representation. Note that the reproduced function is somewhat smoother due to the finite lengthscale. One might interpret the right hand-side panels above as the smoothing filters.

In [8]:
plot(x, f.(x), label="f(x)")
plot!(x, δₓ*F / D, label="F ; l=$l")
Out[8]:

Integrals

The Euclidean inner product $\frac{<F,F>}{D}$ approximates the function inner product $\int_0^1 f(x)f(x) \text{d}x$.

Due to the finite length scale and smoothing in the representation of F (see figure above), is the reproduced norm somewhat smaller.

In [9]:
n1 = F' * F / D
n2 = f.(x)' * f.(x) ./ m

n1, n2
Out[9]:
(0.13373618473057858, 0.15830467214956118)

By setting one of the arguments of the inner product equal to the function 1, we can compute the integral of $f$.

In [10]:
o(x) = 1
O = δₓ' * o.(x) ./ m;

i1 = F' * O ./ D
i2 = f.(x)'* o.(x) ./ m

i1, i2
Out[10]:
(-0.017169531685595537, -0.017169520211703655)

Differentials

The derivatives of the transformed function can be computed via derivation of the the encoding, i.e.,

$$\tilde{f} = <F, \delta>$$ $$\tilde{f}' = <F, \delta'>$$ $$\tilde{f}'' = <F, \delta''>$$ $$\ldots$$

Below, we plot the first derivatives of the transform of our function $f$.

the encoding

In [11]:
D, l =  20_000, 0.1;
δₓ = δ(x, D, l, real=true)
Out[11]:
500×20000 Matrix{Float64}:
 -7.43965    0.902532    -3.39482     …  -10.099    -6.37758   7.19256
 -7.57656    1.23244     -3.80408         -9.39828  -5.44555   6.24557
 -7.66824    1.586       -4.19362         -8.7092   -4.60255   5.378
 -7.71689    1.95558     -4.56038         -8.03393  -3.84331   4.58586
 -7.72362    2.33442     -4.90129         -7.37336  -3.16271   3.86505
 -7.6907     2.71694     -5.21443     …   -6.72961  -2.5569    3.21274
 -7.61974    3.09817     -5.49792         -6.10411  -2.02226   2.62617
 -7.51247    3.4737      -5.75022         -5.49844  -1.55571   2.10305
 -7.37045    3.83952     -5.96991         -4.91411  -1.15455   1.6414
 -7.19459    4.19154     -6.15516         -4.35231  -0.816332  1.23942
 -6.98637    4.52615     -6.30467     …   -3.81479  -0.539075  0.895795
 -6.74763    4.84001     -6.41746         -3.30356  -0.321002  0.609399
 -6.47994    5.12973     -6.49236         -2.82055  -0.160438  0.379207
  ⋮                                   ⋱                        
  0.714549  -0.995224     0.0422714        4.79954  -6.84651   6.7612
  1.01923   -0.72561      0.00171431       5.28768  -6.98517   6.97206
  1.37909   -0.490571    -0.0159828   …    5.77786  -7.0865    7.14888
  1.79434   -0.295182    -0.0887176        6.26622  -7.14845   7.28907
  2.26515   -0.144767    -0.223614         6.74916  -7.16962   7.39056
  2.79154   -0.0448873   -0.424447         7.22285  -7.14859   7.4513
  3.37382   -0.00139618  -0.695171         7.68422  -7.08499   7.47021
  4.01194   -0.0204959   -1.03989     …    8.12933  -6.97777   7.4455
  4.70591   -0.108829    -1.46295          8.5543   -6.82635   7.37576
  5.45504   -0.273526    -1.96872          8.95412  -6.62964   7.25895
  6.25914   -0.522289    -2.56203          9.32486  -6.38794   7.09438
  7.11851   -0.863574    -3.24828          9.66308  -6.10229   6.88209

and the derivatives: a finite difference h=m*l (m:number of points in x, l: lenghtscale) is used. A finite difference allows to filter some noise, as differentiating a noisy function may increase noise.

In [12]:
δₓ_ = deriv_encoding(δₓ, h = round(Int, m*l));
δₓ__ = deriv_encoding(δₓ_, h = round(Int, m*l));

F = δₓ' * f.(x)  / m;

plot(x, δₓ*F / D, label="f")
plot!(x, δₓ_*F / D / 20, label="f' / 20")
plot!(x, δₓ__*F / D / 20^2, label="f'' / 20^2")
Out[12]:

Solve differential equations

We show how these concepts can be used to solve a differential equation. We will use a constant $k$, and the solution of linear ridge regression. For the encoding, we will use only 5000 dimensions allowing for fast matrix inversion, and a shorter length scale $l=0.05$ for enough flexibility in the solution.

In [13]:
k = 10

# solution to ridge regression
function solve_ridge(X, B; λ=1)
    F =  (X'*X + λ*I) \ X'* B
    return F * D
end

# the encoding: fewer dimensions, shorter lengthscale
D, l =  5_000, 0.05;
δₓ = δ(x, D, l, real=true)
Out[13]:
500×5000 Matrix{Float64}:
 -0.154262    -0.971031    -12.4322      …  -15.024        2.90422
 -0.678283    -1.85191     -12.0725         -13.4861       3.98655
 -1.43043     -2.82006     -11.5346         -11.9557       4.9989
 -2.31407     -3.80986     -10.8729         -10.4692       5.91664
 -3.26486     -4.77748     -10.1088          -9.03067      6.7215
 -4.24013     -5.69574      -9.26445     …   -7.64941      7.40574
 -5.19984     -6.53322      -8.34062         -6.32064      7.94917
 -6.11131     -7.26555      -7.35139         -5.05591      8.33904
 -6.94755     -7.87371      -6.31697         -3.87389      8.56774
 -7.67466     -8.33051      -5.25331         -2.79291      8.6187
 -8.24522     -8.59632      -4.17853     …   -1.83681      8.4654
 -8.61495     -8.63884      -3.12769         -1.04135      8.09356
 -8.74672     -8.43683      -2.14788         -0.446871     7.5036
  ⋮                                      ⋱                
  8.37747      7.45616      -0.659054         0.00285793  -5.90668
  7.60238      6.45712      -0.193015         0.185477    -4.75593
  6.68218      5.3795       -0.00385508  …    0.642318    -3.62455
  5.66784      4.27925       0.100852         1.35529     -2.56831
  4.60791      3.20775       0.487859         2.30502     -1.63628
  3.5421       2.20883       1.16518          3.46741     -0.871604
  2.51646      1.33148       2.12777          4.81148     -0.320433
  1.58766      0.63162       3.36805     …    6.30765     -0.0314682
  0.814156     0.166799      4.87075          7.91562      0.0539108
  0.262748     2.65238e-5    6.61138          9.58615      0.438267
  0.00820337   0.198394      8.5478          11.2522       1.23331
 -0.131447     0.827417     10.5935          12.802        2.47469

and the derivatives.

In [14]:
δₓ_ = deriv_encoding(δₓ, h = round(Int, m*l/5));
δₓ__ = deriv_encoding(δₓ_, h = round(Int, m*l/5));

Exponential decay

The differential equation $$ \frac{1}{k}f'(x) + f(x) = 0 $$ can, using the aforementioned expression of the derivative, be expressed for $F$ $$ <\frac{1}{k}\delta_x' + \delta_x, F> = 0 \,. $$

When expressed for the $m$ points in our domain, the equation takes the form X*F = Y with X and Y computed as

In [15]:
X = 1 / k * δₓ_ + δₓ;
Y = zeros(m);

Boundary conditions can be added by concatenation. E.g. $f(0)=1$ is translated as an additional equation in $F$ via $<F, \delta_0> = 1$.

In [16]:
X = vcat(X, δₓ[1,:]')
Y = vcat(Y, 1);

Now we solve as a ridge regression, trying to fit the equations as well as possible. The solution is plotted below

In [17]:
F = solve_ridge(X,Y, λ=1);
plot(x, c=:blue, exp.(-k*x), label="analytic solution", legend=:bottomright, xlabel="x")
plot!(x, δₓ*F / D, linestyle=:dash, linewidth=3, label="approximation")
Out[17]:

Harmonic oscillator

The differential equation $$ \frac{1}{k^2} f'' + f= 0 $$ can, using the aforementioned expression of the derivative, be expressed for $F$ $$ <\frac{1}{k^2}\delta_x'' + \delta_x, F> = 0 \,. $$

Now, we solve this equation anolog to above.

In [18]:
# equation
X = 1/k^2*δₓ__ + δₓ
Y = zeros(m)

# boundary condition: f(x)=1
X = vcat(X, δₓ[1,:]')
Y = vcat(Y, 1)

# solve
F = solve_ridge(X, Y, λ=1);

# plot
plot(x, cos.(k*x), label="analytic solution", xlabel="x")
plot!(x, δₓ*F / D, linestyle=:dash,linewidth=3, label="approximation")
Out[18]: